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Abstract 

To use control charts in practice, the in-control state usually has to be estimated. This 
estimation has a detrimental effect on the performance of control charts, which is often mea- 
sured for example by the false alarm probability or the average run length. We suggest an 
adjustment of the monitoring schemes to overcome these problems. It guarantees, with a cer- 
tain probability, a conditional performance given the estimated in-control state. The suggested 
method is based on bootstrapping the data used to estimate the in-control state. The method 
applies to different types of control charts, and also works with charts based on regression 
models, survival models, etc. If a nonparametric bootstrap is used, the method is robust to 
model errors. We show large sample properties of the adjustment. The usefulness of our 
approach is demonstrated through simulation studies. 

Key words: Monitoring, CUSUM, bootstrap, guaranteed performance, confidence interval, 
control chart 



1 Introduction 



Control charts such as the Shewhart chart (Shewhart. 1931 1 and the cumulative sum (CUSUM) 



2003 LawsonandKen 2005 Woodall 2006 ) and finance (Frisen 



chart (Page 1954) have been valuable tools in many areas, including reliability (O'Connor 
Xie et al. 2002|), medicine ( Carey 



2002 



20081. See Stoumbos et al. (2000) and the special issues of "Sequential Analysis" (2007, Volume 



26, Issues 2,3) for an overview. Often, heterogeneity between observations is accounted for by using 



risk-adjusted charts based on fitted regression models (Grigg and Farewell 2004 Horvath et al 



2004 Gandy et al. 2010b 



A common convention in monitoring based on control charts is to assume the probability dis- 
tribution of in-control data to be known. In practice this usually means that the distribution is 
estimated based on a sample of in-control data and the estimation error is ignored. Examples 
of this arelSteiner et all (|2000b; iGrigg and Farewelll d2004b; iBottle and Aylinl d2008h ; iBiswas and 



Kalbfleisch 



(2008); Fouladirad et al. (2008); Sego et al (20091; Gandy et al (20101. 



However, the estimation error has a profound effect on the performance of control charts. This 



has been mentioned at several places in the literature, e.g. Jones et al. ( 2004 1 ; Albers and Kallenberg 



(2004b 


): 


Jensen et al. 


(2006 


); 



To illustrate the effect of estimation, we consider a CUSUM chart (Page 1954) with normal 
observations and estimated in-control mean. We observe a stream of independent random variables 
Xi,X 2 , ■ ■ ■ which in control have an N(fi, 1) distribution and out of control have an N(fi + A, 1) 
distribution, where A > is the shift in the mean. The chart switches from the in-control state to 
the out-of-control state at an unknown time n. The unknown in-control mean /i is estimated by 
the average jj, of n past in-control observations A_„, . . . , X-i (this is often called phase 1 of the 
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ARL=E(t|A) 

Figure 1: In-control distribution of ARL=E(r|/i) for CUSUMs for standard normally distributed 
data. The mean jj, used in the monitoring is estimated based on n past observations. The boxplots 
show the 2.5%, 10%, 25%, 50%, 75%, 90% and 97.5% quantiles.The top part of the plot shows the 
situation when estimation error is ignored. In the middle part the threshold has been chosen to 
give an unconditional ARL of 100 (averaging out the parameter estimation). In the bottom part 
the threshold is adjusted to guarantee with 90% probability an in-control ARL of at least 100. 
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monitoring; the running of the chart is called phase 2). We consider the CUSUM chart 



S t =max(0,S t -i+X t - p,- A/2), S = 

with hitting time r = inf{t > : St > c} for some threshold c > 0. 

The in-control average run length, ARL = E(r|/t, K = oo), depends on fl and is thus a random 
quantity. The top part of the plot in Figure [l] shows boxplots of its distributions with threshold 
c = 2.84, A = 1 and various numbers of past observations. If fi = /i, i.e. fi was know, this would 
give an in-control ARL of 100. The estimation error is having a substantial effect on the attained 
ARL even for large samples such asn = 1000. For further illustrations of the impact of estimation 
error see Jones et al. (20041 for CUSUM charts and Albers and Kallenberg (2004b) for Shewhart 



charts. 

So far, no general approach for taking the estimation error into account has been developed, 
but there are many special constructions for specific situations. For instance, for some charts so 



called self-starting charts (Hawkins 1987 Hawkins and Oiwell 1998 Sullivan and Jones 2002), 



maximum likelihood surveillance statistics to eliminate parameters (e.g. Frisen and Andersson 



20091, correction factors for thresholds (Albers and Kallenberg |2004b| Jones 



thresholds ( |Zhang et al. 20111 and threshold functions (Horvath et al. 2004 



2002), modified 



Aue et al. 2006 1 



have been developed. Various bootstrap schemes for specific situations have also been suggested, 



see for instance Kirch (2008); Chatterjee and Qiu (20091; Capizzi and Masarotto (2009); Huskova 



and Kirch (2010). Further, some nonparametric charts which account for the estimation error in 



past data have been proposed, see Chakraborti and Graham ( 2007 1 and references therein. Recently 



some modified charts for monitoring variance in the normal distribution with estimated parameters 
have been suggested by |Maravelakis and Castagliola (2009) and Castagliola and Maravelakis (2 011[ ). 

When addressing estimation error, the above methods mainly focus on the performance of the 
charts averaged over both the estimation of the in-control state as well as running the chart once. 
In the middle part of Figure [TJ the threshold has been chosen such that, averaged over both the 
estimation of the in-control state as well as running the chart once, the average run length is 100 
(this results in a different threshold for each n). It turns out that only a small change in the 
threshold is needed and that the distribution of the conditional ARL = E(r|/i) is only changed 
slightly. This bias correction for the ARL actually goes in the wrong direction in the sense that it 
implies more short ARLs. This is due to the ARL being substantially influenced by the right tail 



of the run length distribution, see the discussion in Section 2 of Albers and Kallenberg ( 2006 ) . 
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However, usually, after the chart parameters are estimated, the chart is run for some time 
without any reestimation of the in-control state even if the chart signals. Moreover, in some 
situations, several charts are run based on the same estimated parameters. In these situations 
the ARL conditional on the estimated in-control state is more relevant than the unconditional 
ARL. In the middle and upper part of Figure [T] one sees that the conditional ARL can be much 
lower than 100, meaning that both the unadjusted threshold and the threshold adjusted for bias 
in the unconditional ARL lead, with a substantial probability, to charts that have a considerably 
decreased time until false alarms. 

To overcome these problems we will look at the performance of the chart conditional on the 
estimated in-control distribution, averaging only over different runs of the chart. This will lead to 
the construction of charts that with high probability have an in-control distribution with desired 
properties conditional on the observed past data, thus reducing the situations in which there are 
many false alarms due to estimation error. 

The bottom part of Figure [T] shows the distribution of the in control ARL when the threshold 
for each set of past data is adjusted to guarantee an in-control ARL of at least 100 with probability 
90%. The adjustment is calculated using a bootstrap procedure explained later in the paper. The 
adjustment succeeds to avoid the too low ARLs with the prescribed probability, and we will see 
later that the cost in a higher out-of-control ARL is modest. Using hitting probabilities instead of 
ARL as criterion leads to similar results. 

Our approach is similar in spirit to the exceedance probability concept developed by Albers and 
Kallenberg for various types of Shcwhart ( Albers and Kallenberg 2004a 2005 Albers et al. 2005 1 



and negative binomial charts ( jAlbers and Kallenberg 2009 2010). They calculate approximate 



adjusted thresholds such that there is only a small prescribed probability that some performance 
measure, for instance an ARL, will be a certain amount below or above a specified target. 

The main difference between their approach and what we present is that our approach applies far 
more widely, to many different types of charts and without having to derive specific approximation 
formulas in each setting. If we apply a nonparametric bootstrap, the proposed procedure will be 
robust against model misspecification. In addition to that, our approach allows not only to adjust 
the threshold but also to give a confidence interval for the in-control performance of a chart for a 
fixed threshold. Lastly, even though not strongly advocated in this paper, the bootstrap procedure 
we propose can also be used to do a bias correction for the unconditional performance of the chart, 
as in the middle part of Figure [T] 

Next, we describe our approach more formally. Suppose we want to use a monitoring scheme 
and that the in-control distribution P of the observations is unknown, but that based on past 
in-control behaviour we have an estimate P of the in-control distribution. Let q denote the in- 
control property of the chart we want to compute, such as the ARL, the false alarm probability or 
the threshold needed for a certain ARL or false alarm probability. In the above example we were 
interested to find a threshold such that the in-control ARL is 100. 

Generally, q may depend on both the true in-control distribution P and on estimated parameters 
of this distribution which for many charts are needed to run the chart. We denote these parameters 
by £ = £(P). In the above CUSUM chart example £ = /t. We are interested in g(P;£), that is 
the in-control performance of the chart conditional on the estimated parameter. In the above 
CUSUM example, q(P;£) is the threshold needed to give an ARL of 100 if the observations are 
from the true in-control distribution P and the estimated parameter p, is used. As P is not observed 
<?(-P;£) is not observable. As mentioned above, many papers pretend that the estimated in-control 
distribution P equals the true in-control distribution P and thus use g(P;£). Our suggestion is 
to use bootstrapping of past data to construct an approximate one-sided confidence intervals for 
<j<(P;£). From this we get a guaranteed conditional performance of the control scheme. 

In Section [2] we present the general idea in the setting with homogeneous observations, and 
discuss this for Shewhart and CUSUM charts. The main theoretical results are presented in 
Section[3j with most of the proofs given in the Appendix. Section[4]contains simulations illustrating 
the performance of charts for homogeneous observations. In Section [5] extensions to charts based 
on regression and survival analysis models are presented. Some concluding comments are given 
in Section [6j The suggested methods are implemented in a flexible R-package, that will be made 
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available on the Comprehensive R Archive Network (CRAN). 

2 Monitoring homogeneous observations 
2.1 General idea 

Suppose that in control we have independent observations X\ , X 2 , ■ ■ . following an unknown dis- 
tribution P. We want to use some monitoring scheme/control chart that detects when Xi is no 
longer coming from P. The particular examples we discuss in this paper are Shewhart and CUSUM 
charts, but the methodology we suggest applies more widely. 

To run the charts, one often needs certain parameters £. For example, in the CUSUM control 
chart of the introduction, we needed £ = fi, the assumed in-control mean. These parameters will 
usually be estimated. 

Let t denote the time at which the chart signals a change. As r may depend on £, we sometimes 
write t(£). The charts we consider use a threshold c, which determines how quickly the chart signals 
(larger c lead to a later signal). 

The performance of such a control chart with the in-control distribution P and the parameters 
£ can, for example, be expressed as one of the following. 

• ARL(P; £) = E(r(£)), where E is the expectation with respect to P. 

• hit(P;£) = P(t(£) < T) for some finite T > 0, where P is the probability measure under 
which Xi, X2, • ■ ■ ~ P. This is the false alarm probability in T time units. 

• carl(-P;£) = inf{c > : ARL(P;£) > 7} for some 7 > 0. Assuming appropriate continuity, 
this is the threshold needed to give an in-control average run length of 7. 

• Chit(-P;£) = inf{c > : hit(P;£) < 0} for some < (3 < 1. This is the threshold needed to 
give a false alarm probability of (3. 

The latter two quantities are very important in practice, as they are needed to decide which 
threshold to use to run a chart. In the notation we have suppressed the dependence of the quantities 
on c, T, 7, j3 and A. 

In the following, q will denote one of ARL, hit, carl or Chit, or simple transformations such as 
log(ARL), logit(hit), log(c ARL ) and log(c hit ), where logit(x) = log 

The true in-control distribution P and the parameters £ = £(P) needed to run the chart are 
usually estimated. We assume that we have past in-control observations X_ n , . . . ,X_i (inde- 
pendent of X\ , X2, ■ ■ ■ ) , which we use to estimate the in-control distribution P parametrically or 
non-parametrically. We denote this estimate by P. The estimate of £ will be denoted by £ = £ (P) . 
For example, in the CUSUM control chart of the introduction, £ = jj, is the estimated in-control 
mean. 

The observed performance of the chart will depend on the true in-control distribution P as 
well as on the estimated parameters £ that are used to run the chart. Thus we are interested in 
q(P;£), the performance of the control chart conditional on £. This is an unknown quantity as P 
is not known. Based on the estimator q(P; £), we construct a one-sided confidence interval for this 
quantity to guarantee, with high probability, a certain performance for the chart. We choose to 
call the interval a confidence interval, even though the quantity g(P;£) is random. 

We suggest the following for guaranteeing an upper bound on q (which is relevant for q = hit, 
q = carl or q = Chit). For a G (0, 1), let p a be a constant such that 

P(q(P;0-q(P;0>p a ) = l-a, 
assuming that such a p a exists. Hence, 

P(q(P;i)<q(P;i)-p a ) = l-a. 
Thus (— oo,g(P;£) — p a ) could be considered an exact lower one-sided confidence interval of 



4 



Of course, p a is unknown. We suggest to obtain an approximation of p a via bootstrapping. 
In the following, P* denotes a parametric or non-parametric bootstrap replicate of the estimated 
in-control distribution P. We can approximate p a by p* a such that 

P(q(P*;i*)-q(P;h>Pl\P) = !-<*■ 

Thus 

(-oo,q(P;i)-p* a ) (1) 

is a one-sided (approximate) confidence interval for q(P; £). In this paper, we will use the following 
generic algorithm to implement the bootstrap. 

Algorithm 1 (Bootstrap). 

1. From the past data A_ n , . . . , X_ 1; estimate P and £. 

2. Generate bootstrap samples A^ n , . . . , Xt 1 from P. Compute the corresponding estimate P* 
and £* . Repeat B times to get P* , . . . , P|j and £*,.-., £jj ■ 

3. Let p* a be the 1 — a empirical quantile of q{P^]S,l) — q{P]£,t), b — 1, . . . ,B. 

For guaranteeing a lower bound on q, which is for example relevant for q — ARL, a similar 
upper one-sided confidence interval can be constructed. 

In a practical situation, the focus would be on deciding which threshold to use for the control 
chart to obtain desired in-control properties. We suggest to use either q = carl or q = Chit, or log 
transforms of these, and then run the chart with the adjusted threshold 

q(P;£)-p*a- (2) 

This will guarantee that in (approximately) 1 — a of the applications of this method, the control 
chart actually has the desired in-control properties. 



2.2 Specific charts 
2.2.1 Shewhart charts 



The one-sided Shewhart chart (Shewhart 19311 signals at 



r = inf{<e{l,2,...}:/(X t ,0>c} 

for some threshold c, where / is some function, X t is the observation at time t and £ are some 
parameters. X t can be a single measurement or e.g. the average, range or standard deviation of a 
specified number of measurements, or some other statistic like a proportion. It is common to use a 
Shewhart chart with a threshold of the mean plus 3 times the standard deviation, in this case one 
would use c = 3 and f = with £i being the mean and £2 being the standard deviation. 

For two-sided charts one could just use f{x,^) = ^ ■ 

Conditionally on fixed parameters £, the stopping time r follows a geometric distribution with 
parameter p — p(c; P, £) = P(f(X t , £) > c). Then the performance measures mentioned in the 
previous section simplify to 

ARL(P;£) = / hit(P;0 =1 - (1 - p(c; P, £)) T , 

carl(P;0 =V~ X (~-> P ^) and c hit (P;e)=p- 1 (l-(l-/3)*;P,e), 

where p -1 (-;P,£) is the inverse ofp(-;P,£). 

Suppose that the in-control distribution comes from a parametric family Pg,9 €E O. Further- 
more, suppose that we have some way of computing an estimate 9 of 9 based on the sample. Then 
we can use Algorithm [l] with P = Pg to compute a confidence interval as given by |lj) . 

Shewhart charts depend heavily on the tail behaviour of the distribution of the observations. 
This is particularly problematic when the sample size is small and we use non-parametric methods 
or a simple non-parametric bootstrap. We thus primarily suggest to use a parametric bootstrap 
for Shewhart charts. 
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Remark 1. In certain cases the parametric bootstrap will actually be exact when B — > oo. This 
happens when the distribution of q(Pa;£) — q(Pg;£ l ) under Pg does not depend on 9. In particular, 
this implies that q(P§*;£*) — q(Pf,^*) has the same distribution and p* a —¥ p a as B — > oo. 

As an example, consider the case when /(#,£) = an d Xt follows an N^i,^) distribution 

and q is any of the performance measures described above. We use 9 = £ and as estimator £i we 
use the sample mean and as estimator £2 we use the sample standard deviation. Then 

p(c;i l , f -) = P { (^i>c)=l-*(*±|^), 

where <E> is the cdf of the standard normal distribution, and under P^, 

where W ~ Xn-l ^ ~ AT(0, 1) are independent. Thus the distribution of p(c; P^, £), and hence 
q(P%; £), is completely known. As p(c; Pg, £) = ( Xt 7^ 1 > cj = 1 — $(0), and thus q(Pc', i), is not 

random, the distribution of q{Pe',£,) — q(P£'>£) also does not depend on any unknown parameters. 
Thus the parametric bootstrap is exact in this example. 

2.2.2 CUSUM charts 



This section considers the one-sided CUSUM chart ( Page 1954 ) . The classical CUSUM chart was 
designed to detect a shift of size A > in the mean of normally distributed observations. Let fj, 
and a denote, respectively, the in-control mean and standard deviation. A CUSUM chart can be 
defined by 

St = max(0, St-i + (X t -fx- A/2)/a), S = (3) 

with hitting time r = inf{< > : St > c] for some threshold c > 0. 

Alternatively, we could drop the scaling and not divide by the standard deviation a in ([3| . See 



Chapter 1.4 in |Hawkins and Olwell (19981 for a discussion on scaled versus unsealed CUSUMs. 



More generally, to accommodate observations with general in-control distribution with density 



fo and general out-of-control distribution with density /1, it is optimal in a certain sense (Mous- 
takides| |1986| ) to modify the CUSUM chart by replacing (X t - (J. - A/2)/ct by the log likelihood 



ratio log(fi(X t ,8)/f (X t ,6)) such that the CUSUM chart is 

St = max(0, St-,. +log(f 1 (Xt,d)/fo(X t ,d))), S Q = 0. (4) 

Let £ denote either (/x, <r) in ^ or 9 in Q. Usually, £ needs to be estimated from past data, 
and we can then use Algorithm [I] to compute a confidence interval ([I]) for the performance measure 
q(P; £). For Q it is most natural to use a parametric bootstrap with P — Pg, while for we can 
use either a parametric or a nonparametric bootstrap. In the latter case we let P be the empirical 
distribution of X- n , . . . , i.e. in Algorithm [TJ X*_ n , . . . , X*_^ are sampled with replacement 

from X_ n ,...,X_i. 

Remark 2. Similar as for Shewhart charts, this parametric bootstrap is exact when the distribution 
ofq(P§;£)—q(P0)£) does not have any unknown parameters. This is, for instance, the case if we use 
Q) for an exponential distribution with the out-of-control distribution specified as an exponential 
distribution with mean AA, where A is the in-control mean. Another example of this is when we 
have normally distributed data and use a CUSUM with the increments (X t — fi)/o — A/2. 



3 General theory 

In this section, we show that asymptotically, as the number of past observations n increases, our 
procedure works. An established way of showing asymptotic properties of bootstrap procedures is 
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via a functional delta method (van der Vaart and Wellner, 1996 Kosorok 2008). Whilst we will 



follow a similar route, our problem does not fit directly into the standard framework, because the 
quantity of interest, q{P,/;), contains the random variable £. We present the setup and the main 
result in Section 3.1 followed by examples (Section 3.2). 



3.1 Main theorem 

Let D q be the set in which P and its estimator P lie, i.e. a set describing the potential probability 
distribution of our observations. This could be a subset of K d for parametric distributions, the set 
of cumulative distribution functions for non-parametric situations, or the set of joint distributions 
of covariates and observations. We assume that D q is a subset of a complete normed vector space 
D. Let 3 be a non-empty topological space containing the potential parameters £ used for running 
the chart. In our examples, we will let 5 C M d be an open set. 

We assume that P* = P*(P, W n ) is a bootstrapped version of P based both on the observed 
data P and on an independent random vector W n . For example, when resampling with replacement 
then W n is a weight vector of length n, multinomially distributed, that determines how often a 
given observation is resampled. In a parametric bootstrap, W n is the vector of random variables 
needed to generate observations from the estimated parametric distribution. 

In the main theorem we will need that the mapping q : D q xS -> R, which returns the property 
of the chart we are interested in, satisfies the following extension of Hadamard differentiability. For 
the usual definition of Hadamard differentiability see e.g. (van der Vaart 1998 Section 20.2). The 



extension essentially consists in requiring Hadamard differentiability in the first component when 
the second component is converging. 

Definition 1. Let D,E be metric spaces, let Df C D and let 3 be a non-empty topological space. 
The family of functions {/(•;£) : Df — > E : £ £ 3} is called Hadamard differentiable at 9 £ Df 
around £ £ 3 tangentially to Do C D if there exists a continuous linear map /'(#;£) : -Do — * E 
such that 

f(0 + t n h n ;£ n )- f(0;£ n ) 



t, 



f'(6;0(h) (n^oo) 



for all sequences (£„) C 3, (t n ) C R, (h n ) C D that satisfy 9 + t n h n £ Df\/n and £ n — > £, t n — > 0, 
h„ — > h £ Dq as n — > oo. 

In the following theorem we understand convergence in distribution, denoted by — *, as defined 



van der Vaart and Wellner ( 1996 Def 1.3.3) or in Kosorok (2008 p.108) 



Theorem 1. Let q 



D q x 3 -> 



be a mapping, let P £ D q and let £ : D q 



3 be a continuous 



function. Suppose that the following conditions are satisfied. 



a) q is Hadamard differentiable at P around £ tangentially to Dq for some Dq C D. 

b) P is a sequence of random elements in D q such that y/n{P — P) Z as n — » oo where Z is 
some tight random element in Dq. 

c) Jn~(P*-P)£,Z asn 

w 



oo where ~--> denotes weak convergence conditionally on P in proba- 



bility as defined in Kosorok (2008 p.19) 



d) The cumulative distribution function of q'(P;£,)Z is continuous. 

e) Outer-almost surely, the map W n > h(P*(P,W n )) is measurable for each n and for every 
continuous bounded function h : D q — ► K. 

f) q(P,£) — q(P',0 and p*^ are random variables, i.e. measurable, where £ = £(P) and p* a — 
inf{t £ K : P(q{P*;t) ~ q{P;€*) <*)><*}• 



Then 



P(q(P; £ (-oo, q(P; £) - p* a )) 1 - a (n -> oo). 



A similar result holds for upper confidence intervals. 

The proof is in Appendix [XJ The theorem essentially is an extension of the delta-method. 
Condition a) ensures the necessary differentiability. Conditions b) and c) are standard assumptions 
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for the functional delta method; b) for the ordinary delta method and c) for the bootstrap version 
of it. Condition d) ensures that, after using an extension of the delta-method, the resulting 
confidence interval will have the correct asymptotic coverage probability. Condition e) is a technical 
measurability condition, which will be satisfied in our examples. Condition f) is a measurability 
condition, which should usually be satisfied. 



3.2 Examples 

The following sections give examples in which Theorem [T] applies. We consider hitting probabilities 
(q = hit) and thresholds to obtain certain hitting probabilities (q — Chit)- 

These examples are meant to be illustrative rather than exhaustive. For example, other para- 
metric setups could be considered along similar lines to Section |3.2.2| Furthermore, other perfor- 
mance measures such as log(chit) or logit(hit) would essentially require application of chain rules 
to show differentiability. 



3.2.1 Simple nonparametric setup for CUSUM charts 

We show how the above theorem applies to the CUSUM chart described in ^ when using a 
non-parametric bootstrap version of Algorithm [T] 

Let D = ioo(M) be the set of bounded functions R — >• R equipped with the sup- norm ||x|| = 
sup teR |x t |. Let D q C D be the set of cumulative distribution functions on R with finite sec- 
ond moment. The parameters needed to run the chart are the mean and the standard devia- 
tion of the in-control observations, thus we may choose E = Ix (0, oo) and £ : D q — > 5, P ^ 
(J xP(dx), J x 2 P{dx) - (J xP(dx)) 2 ). 

As quantities q of interest we are considering hitting probabilities (q — hit) and thresholds 
(<? = Chit) needed to achieve a certain hitting probability. The probability hit : D q x H — > R of 
hitting a threshold c > up to step T > can be written as hit(P;£) = P(m(Y) > c), where 
m(Y) = maxi^i....^ Ri{Y) is the maximum value of the chart up to time T, Ri(Y) = Y]j—i Yj — 
mma< k <i Yj is the value of the CUSUM chart at time i,Y = (Y 1 ,...,Y T ),Y t = X ^~ A / 2 
and X\, . . . , Xt ~ P are the independent observations. The threshold needed to achieve a certain 
hitting probability /3 € (0, 1) is c hit : D q x E -> R, c hit (P;£) = inf{c > : hit(P;£) < j3}. 

The setup for the nonparametric bootstrap is as follows. W n is an n-variate multinomially 
distributed random vector with probabilities 1/n and n trials. The resampled distribution is 
P* = n S^=i W n jSx_ J , where 8 X denotes the Dirac measure at x. 

The following lemma shows condition a) of Theorem [l] the Hadamard differentiability of hit 
and Chit- 

Lemma 1. For every P G D q , and every £ € Rx (0, oo), the function hit is Hadamard differentiable 
at P around £ tangentially to Dq — {H : R — > R : H continuous, lim t _ >00 H(t) = lim t _>._ 00 H(t) = 
0}. If, in addition, P has a continuous bounded positive derivative f with f(x) — > as x — > ±oo, 
then Chit is also Hadamard differentiable at P around £ tangentially to Dq . 



The proof is in Appendix |B.4| with preparatory results in Appendix |B. 1| - |B.3| 

Conditions b) and c) of Theorem [l] follow directly from empirical process theory, see e.g. 



(Kosorok 2008 p. 17, Theorems 2.6 and 2.7). Condition e) is satisfied as well, see bottom of 



p.189 and after Theorem 10.4 (p.184) of|Kosorok (2008) 



Verifying condition d) in full is outside the scope of the present paper. A starting point could 
be the fact that by the Donsker theorem, Z~GoP, where G is a Brownian bridge. 



3.2.2 CUSUM charts with normally distributed observations 

In this section, we consider a similar setup to the monitoring based on ^ considered in the previous 
subsection with the difference that we now use parametric assumptions. More specifically, the 
observations Xi follow a normal distribution with unknown mean /j, and variance a 2 . We will use 
this both for computing the properties of the chart as well as in the bootstrap, which will be a 
parametric bootstrap version of Algorithm [T] 
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The distribution of the observations can be identified with its parameters which we estimate 
by P = (//, a 2 ), where fi = ^J2i=i x -i an d o 2 = ^zjYn=i(X-i - A) 2 - The set of potential 
parameters is D q = M x (0, oo) which is a subset of the Euclidean space D = M 2 . The parameters 
needed to run the chart ^ are just the same as the one needed to update the distribution, thus 
3 = D q and £ : D q —> S, (/x, cr) i-» (/x, a) is just the identity. 

As before, we are interested in hitting probabilities within the first T steps. Using the function 
hit defined in the previous subsection, we can write the hitting probability in this parametric setup 
as hit N : D q x S — > M, (/x, <r;£) n- hit(<i> M , cr 2 ; £), where $ M ,o- 2 is the cdf of the normal distribution 
with mean /x and variance a 1 and the superscript N stands for normal distribution. Furthermore, 
using Chit from the previous subsection, the threshold needed to achieve a given hitting probability 
is i:fl,x»-) R, (/x,cr;0 i-> c h i t ($ P , CT 2 ; f). 

The resampling is a parametric resampling. To put this in the framework of the main theorem, 
we let W n = (W„i, . . . , W nn ), where W n i, . . . ,W nn ~ -^(0,1) are independent. The resampled 
parameters are then £* = I ^=1 *™ and ct; 2 = ^ Eti( X ni ~ K? where X*, = P 2 W ni + A- 

The following lemma shows that condition a) of Theorem [I] is satisfied. 

Lemma 2. For every flelx (0, 00) and every £ € K x (0, 00), £/ie functions h\t N and c^ it are 
Hadamard differentiable at 9 around £. 

The proof can be found in Appendix |B.4| using again the preparatory results of Appendix |B.1| 

Concerning the other conditions of Theorem [T] Condition b) can be shown using standard 
asymptotic theory, e.g. maximum likelihood theory, which will yield that Z is normally distributed. 
Condition c) is essentially the requirement that the parametric bootstrap of normally distributed 
data is working. As Z is a normally distributed vector, condition d) holds unless q' equals 0. 
Condition e) is satisfied, as the mapping W n H> P*(P, W n ) — (/x*,ct* 2 ) is continuous and hence 
measurable. 



3.2.3 Setup for Shewhart charts 

For Shewhart charts, the same setup as in the previous two sections can be used, the only difference 
is the choice of q. Conditions b), c) and e) are as in the previous two sections. We conjecture that 
it is possible to show the Hadamard differentiability more directly, as the properties are available 
in closed form, see Section [2.2. 1| 



4 Simulations for homogeneous observations 

We now illustrate our approach by some simulations using CUSUM charts. The simulations were 



done in R (R Development Core Team 2010) 



We use two past sample sizes, n = 50 and n = 500. The in-control distribution of X t is xV(0, 1) 
and we use 1000 replications and B = 1000 bootstrap replications. We employ both the parametric 
bootstrap and the nonparametric bootstrap mentioned in the previous sections. For the parametric 
bootstrap we used the sample mean and sample standard deviation of A_„, . . . , AT_i as estimates 
for the mean and the standard deviation of the observations. 

For the performance measures ARL, log(ARL), hit and logit(hit) we use a threshold of c = 3. 
For carl we calibrate to an ARL of 100 in control and for Chit we calibrate to a false alarm 
probability of 5% in 100 steps. 

We use the CUSUM chart ^ with A = 1 and /x and a estimated from the past data. To 
compute properties such as ARL or hitting probabilities, we use a Markov chain approximation 



(with 75 grid points), similar to the one suggested in Brook and Evans (1972). 



4.1 Coverage probabilities 

Table [I] contains coverage probabilities of nominal 90% confidence intervals. These are the one-sided 
lower confidence intervals given by 0, except for q — ARL and log(^4i?L) where the corresponding 
upper interval is used. 
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Table 1: Coverai 



,ge probabilities of nominal 90% confidence intervals for CUSUM charts. 





Parametric 


Nonparametric 


q= 


n=50 


n=500 


n=50 


n=500 


ARL 


1.000 


0.929 


0.999 


0.944 


log(ARL) 


0.928 


0.899 


0.902 


0.915 


hit 


0.923 


0.896 


0.878 


0.910 


logit(hit) 


0.892 


0.893 


0.870 


0.904 


CARL 


0.881 


0.892 


0.846 


0.893 


log(cARL) 


0.896 


0.895 


0.868 


0.904 


Chit 


0.878 


0.890 


0.843 


0.891 


log(chit) 


0.897 


0.893 


0.856 


0.901 



The standard deviation of the results is roughly 0.01. 
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Figure 2: Distribution of the conditional ARL for CUSUMs in a normal distribution setup. Thresh- 
olds are calibrated to an in-control ARL of 100. The adjusted thresholds have a guarantee of 90%. 
A log transform is used in the calibration. The boxplots show the 2.5%, 10%, 25%, 50%, 75%, 90% 
and 97.5% quantiles. The white boxplots are in-control, the gray boxplots out-of-control. 



In the parametric case, for n = 50, the coverage probabilities are somewhat off for untransformed 
versions, in particular for q = ARL. Using log or logit transformations seems to improve the 
coverage probabilities considerably. In the parametric case, for n = 500, all coverage probabilities 
seem to be fine, except for q — ARL, which although shows some marked improvement compare 
to n = 50. In the nonparametric case, a similar picture emerges, but the coverage probabilities are 
a bit worse than in the parametric case. 

Remark 3. For q = log(cARL) & n d q = log(chit) the division by a in could be skipped without 
making a difference to the coverage probabilities. Indeed, the division by a just scales the chart 
(and the resulting threshold) by a multiplicative factor, which is turned into an additive factor by 
log and which then cancels out in our adjustment. 

4.2 The benefit of an adjusted threshold 

In this section, we consider both the in- and out-of-control performance of CUSUM charts when 
adjusting the threshold c to give a guaranteed in-control ARL of 100. Setting the threshold is, in 
our opinion, the most important practical application of our method. 

Figureh^shows average run lengths for both the unadjusted threshold c(P; ft, a) and the adjusted 
threshold exp(log(c(P; fi, a)) — Pq.i), where Pq a is computed via the parametric bootstrap using 
q = log(cARL)- Thus, with 90% probability, the adjusted threshold should lead to an ARL that is 
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Figure 3: Effects of misspecification. Thresholds are calibrated to an in-control ARL of 100 and 
adjusted to the estimation error with a guarantee of 90%. A log transform is used in the calibration. 
The white boxplots are in-control, the gray boxplots are out-of-control. The boxplots show the 
2.5%, 10%, 25%, 50%, 75%, 90% and 97.5% quantiles. 



above 100. In this and in all following simulations, the out-of-control ARL refers to the situation 
where the chart is out-of-control from the beginning, i.e. from time onwards. 

For the unadjusted threshold, the desired in-control average run length is only reached in 
roughly half the cases. More importantly, for n = 50, the probability of having an in-control ARL 
of below 50 is greater than 20%. 

With the adjusted threshold we should get an average run length of at least 100 in 90% of 
the cases. This is achieved. The out-of-control ARL using the adjusted thresholds increases only 
slightly compared to the unadjusted version. 

Similarly to Remark |3j removing the scaling by a in ([3]) would not change the results of this 
section. 



4.3 Nonparametric bootstrap - advantages and disadvantages 

In this section, we compare the parametric and the non-parametric bootstrap. We consider CUSUM 
charts that are calibrated to an in-control average run length of 100 assuming a normal distribution. 
We use the adjusted threshold exp(log(cARL(-P; Ai &)) — Po.i)- 

Figure [3] shows the distribution of ARL for n — 50 and n — 500 for both the parametric 
bootstrap that assumes a normal distribution of the updates and the nonparametric bootstrap. 
We consider both a correctly specified model where X t ~ N(0, 1) as well as two misspecified 
models where X t ~ Exponential(l) and y/20X t ~ Xio ( a ^ °f the ^-t have variance 1). We show 
both the in- as well as the out-of-control performance of the charts. 

In the correctly specified model (X t ~ N(0, 1)), the performance of the parametric and the non- 
parametric chart seems to be almost identical. The only difference is a slightly worse in-control 
performance for the non-parametric chart for n = 50. 

In the misspecified model with X t ~ Exponential(l), the parametric chart does not have the 
desired in-control probabilities. The non-parametric chart seems to be doing well, in particular for 
n = 500. We have a similar results in the other misspecified model, with y/20X t ~ Xio- 
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5 Regression models 



In many monitoring situations, the units being monitored are heterogeneous, for instance when 
monitoring patients at hospitals or bank customers. To make sensible monitoring systems in such 
situations, the explainable part of the heterogeneity should be accounted for by relevant regression 
models. The resulting charts are often called risk adjusted, and an overview of some such charts 
can be found in Grigg and Farewell ( 2004| ). 

To run risk adjusted charts, the regression model needs to be estimated based on past data, and 
this estimation needs to be accounted for. Our approach for setting up charts with a guaranteed 
performance applies also to risk adjusted charts, and we will in particular look at linear, logistic 
and survival models. 



5.1 Linear models 

Suppose we have independent observations (Yi, X±), (Y^, X2), ■ ■ ■, where Yi is a response of interest 
and Xi is a corresponding vector of covariates, with the first component usually equal to 1. Let P 
denote the joint distribution of (Y^Xi) and suppose that in control E(Yi|X,-) = JQ£. From some 

observation k there is a shift in the mean response to E(3^|JQ) = A + X£ for i = k, k + 1, 

Monitoring schemes for detecting changes in regression models can naturally be based on resid- 



uals of the model, see for instance Brown et al. (1975) and Horvath et al. (2004). We can, for 



instance, define a CUSUM to monitor changes in the conditional mean of Y by 

S t = max(0, S t _i + F t - X t £ - A/2), 5 = 0, 

with hitting time r = inf{< > : St > c} for some threshold c > 0. In a similiar manner we could 
also set up charts for monitoring changes in other components of £. 

The parameter vector £ is estimated from past in control data, e.g. by the standard least squares 
estimator. We suggest to use a nonparametric version of the general Algorithm [l] with P being the 
empirical distribution putting weight 1/n on each of the past observations (Y^ n , X- n ), . . . , (Y—x, X—i). 
Resampling is then equivalent to resampling (Y* n ,X* n ), . . . , (Y* lt X* x ) by drawing with replace- 
ment from P. 

The suggested method should work even if the linear model is misspecified, i.e. E(i^| JCj) = X^ 
does not necessarily hold. The nonparametric bootstrap should take this into account. 

An analogous approach can be used for Shewhart charts. In settings where it is reasonable 
to consider the covariate vector to be non-random one could alternatively use bootstrapping of 



residuals, see for example Freedman ( 1981 ) 



5.1.1 Theoretical considerations 



Obtaining precise results is more demanding than in the examples without covariates in Section 3.2 
We only give an idea of the setup that might be used. 

The set of distributions of the observations D q can be chosen as the set of cdfs on R d+1 with 
finite second moments, where d is the dimension of the covariate. The first cdf corresponds to the 
responses, the others to the covariates. D q is contained in the vector space D = ioo(R d+1 ), the 
set of bounded functions M. d+1 — > R. The parameters needed to run the chart are the regression 
coefficients contained in the set S = M. d . These parameters are obtained from the distribution 
of the observations via £ : D q -s- S, F n- (E(X T X))- l E{XY) where {Y,X) ~ F where X is 
considered to be a row vector. 

We conjecture that the conditions of Theorem [T] are broadly satisfied if the cdf of Y — X£ is 
differentiable and if for the property q we use hitting probabilities or thresholds to achieve a given 
hitting probability. In particular, it should be possible to show Hadamard differentiability similarly 
to Lemma [I] write q as concatenation of two functions and use the chain rule in Lemma |4j The 
first mapping returns the distribution of the updates of the chart depending on F 6 D q and £ € E 
via (F;£) i-> C(Y — X^ — A), where C denotes the law of a random variable. The second takes 
the distribution of the updates and returns the property of interests. The differentiability of the 
second map has been shown in Lemmas [5] and [6] 
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Figure 4: Distribution of the conditional ARL for CUSUMs in a linear regression setup. Thresholds 
are calibrated to an in-control ARL of 100. A log transform is used in the calibration. The adjusted 
thresholds have a guarantee of 90%. The white boxplots are in control, the gray out-of-control. 
The boxplots show the 2.5%, 10%, 25%, 50%, 75%, 90% and 97.5% quantiles. 



5.1.2 Simulations 

We illustrate the performance of the bootstrapping scheme using a CUSUM and the linear in-control 
model Y = X 1 +X 2 + X 3 +e. Let e - N(0, 1), Xt ~ Bernoulli(0.4), X 2 ~ U(0, 1) and X 3 ~ N(0, 1), 
where X±, X 2 , X 3 and e are all independent. The out-of-control model is Y — 1 + X\ + X 2 +X 3 + e, 
i.e. A = 1. Figure [4] shows the distribution of the attained ARL for CUSUMs with thresholds 
calibrated to give an in control ARL of 100. We see that the behaviour of the adjusted versus 
unadjusted thresholds are very similar to what we observed for the simpler model in Figure [2j The 
coverage probabilities obtained for this regression model, not reported here, are also very similar 
to the covarage probabilities reported in Table [T] though with a tendency to be slightly worse. 



5.2 Logistic regression 

Control charts, in particular CUSUM charts, based on logistic regression models are popular for 
modelling of binary outcomes in medical contexts. See 



Lie et al. 


(1993 


). 


Steiner et al. 


(2000) 



Grigg and Farewell| ( |2004[ ) and |Woodall| ( |2006| ). 

Suppose we have independent observations (Yi,X\), (Y 2 , X 2 ) 7 . . . , where Yi is a binary response 
variable and Xi is a corresponding vector of covariates. Further, suppose that in control the log 
odds ratio is logit(P(Yi = 1|-X",)) = X^, and that from some observation k there is a shift in the 
log odds ratio to logit(P(Y 4 = 1|A;)) = A + X^ for i = k, k + 1, . . . 



2000) 



A CUSUM to monitor changes in the odds ratio can be defined by ( Steiner et al. 

S t = max(0, S t -i +Rt), 5*0 = 0, 

where Rt is the log likelihood ratio between the in-control and out-of-control model for observation 
t. More precisely 



exp(i?t) 



cxp(A + Xtt) Y '/{l + exp(A + XtQ) 
exp(A t C)^/(l + exp(A t O) 



exp(F t A) 



1 + cxp(A t g) 
1 + exp(A + XtO ■ 



The parameter vector £ is estimated from past in-control data by e.g. the standard maximum 
likelihood estimator. The same nonparametric bootstrap approach as described for the linear 
model in Section [5~T] can now be applied to this CUSUM based on this logistic regression model. 
Moreover, this approach would also apply to control charts based on other generalized linear models, 
for instance Poisson regression models for monitoring count data. The only amendment needed is 
to replace R t by the relevant log likelihood ratio. 
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Figure 5: Distribution of the conditional hitting probability for survival analysis CUSUMs. Thresh- 
olds are calibrated to an in-control hitting probability of 0.1. The adjusted thresholds have a 
guarantee of 90%. The white boxplots are in control, the gray out-of-control. The boxplots show 
the 2.5%, 10%, 25%, 50%, 75%, 90% and 97.5% quantiles. 



We have run simulations, not reported here, based on the same covariate specifications as in 
Section f5. 1.21 The results are similar to the results for the linear model of Section [5.1.21 



5.3 Survival analysis models 

Recently, risk adjusted control charts based on survival models have started to appear, see | Biswas] 



and Kalbflcisch (20081; Sego et al. (2009); Steiner and Jones (2009); Gandy et al. (2010). In none of 



these papers any adjustment for estimation error is done, but Sego et al. (20091 are illustrating, by 



simulations, the impact of estimation error on the attained average run length for the accelerated 
failure time model based CUSUM studied in their paper. 

In the following, we provide a brief simulation example of our adjustment in a survival setup 



where we use the methods described in Gandy et al. (2010). 



We observe the survival of individuals over a fixed time interval of length n (we will use n = 100 
and n = 500). Individuals arrive at times Bi (in our simulation according to a Poisson process 
with rate 1), and survive for Ti time units. Individuals may arrive before the observation interval, 
as long as Bi + Ti is after the start of the observation interval. Right-censoring, at Ci time 
units after arrival, is taking place after a maximum follow-up time of t — 60 time units or after 
the individuals leave the observation interval. In the simulation, the true hazard rate of T, is 
hi(t) = 0.1 exp(Xu + X 2 i), where Xu ~ Bernoulli(0.4) and Xu ~ N(0, 1) are covariates. 

Based on the observed data we fit a Cox proportional hazard model with Xu and Xu as 
covariates and nonparametric baseline, giving estimates /3 for the covariate effects and A (i) for 
the the integrated baseline. 

We use the CUSUM chart described in Gandy et al. ( 2010[ ) against a proportional alternative 
with p — 1.25. The parameters needed to run the chart are £ = (/3, A ) estimated by £ = (f3, A ). 
To be precise, the chart signals at time r = inf{t > : S(t) > c}, where S(t) — R(t) — inf s < t R(s), 
R(t) = log(p)N(t)-(p-l)A(t), N(t) is the number of events until time t and A(i) = £V exp(J3iXu+ 
/3 2 X 2i )A (mm((t - T^d)). 

We are interested in finding a threshold that gives a desired hitting probability, i.e. we use 
1 = c hit ■ We compute Chit (P, £) via simulations (simulate new data from P and run the chart 
with £). We estimate the threshold needed to get a 10% false alarm probability in n time units in 
control, by the 90% quantile of 500 simulations of the maximum of the chart. 

To resample, we resample individuals with replacement. We use 500 bootstrap samples. Figure 
[5] shows the distribution of the resulting hitting probabilities based on 500 simulated observation 
intervals. 
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In control, without the adjustment, the desired false alarm probability of 0.1 is only reached in 
roughly 60% of the cases. The bootstrap correction seems to work fine, leading to a false alarm 
probability of at most 10% in roughly 90% of the cases. As expected, increasing the length of the 
fitting period and the length of time the chart is run from n = 100 to n = 500 results in higher 
out-of-control hitting probabilities. 

If the length of the fitting period and the deployment period of the chart differ then a somewhat 
more complicated resampling procedure needs to be used. For example, one could resample arrival 
times and survival times/covariates separately. The former could be done by assuming a Poisson 
process as arrival time and the latter either by resampling with replacement or by sampling from 
an estimated Cox model and an estimated censoring distribution. 



6 Conclusions and discussion 

We have presented a general approach for handling estimation error in control charts with estimated 
parameters and unknown in-control distributions. Our suggestion is, by bootstrap methods, to 
tune the monitoring scheme to guarantee, with high probability, a certain conditional in-control 
performance (conditional on the estimated in-control distribution). If we apply a nonparametric 
bootstrap, the approach is robust against model specification error. 

In our opinion, focusing on a guaranteed conditional in-control performance is generally more 
relevant than focusing on some average performance, as an estimated chart usually is run for 
some time without independent reestimation. Our approach can also easily be adapted to make 
for instance bias adjustments. Bias adjustments, in contrast to guaranteed performance, tend to 
be substantially influenced by tail behaviour for heavy tailed distributions which for instance the 
average run length has. This implies that the bias adjustments need not be useful in the majority 
of cases as the main effect of the adjustment is to adjust the tail behaviour. 

We have in particular demonstrated our approach for various variants of Shewhart and CUSUM 
charts, but the general approach will apply to other charts as well. The method is generally relevant 
when the in-control distribution is unknown and the conditions of Theorem [l] hold. We conjecture 
that this will be the case for many of the most commonly used control charts. Numerous extensions 
of control charts to other settings exist, for example to other regression models, to autocorrelated 
data, to multivariate data. We do conjecture that our approach will also apply in many of these 
settings. 



A Proof of the main theorem 

The following extension of the functional delta method will help in the proof of Theorem [T] 

Lemma 3. Suppose that q : D q x S — > E is Hadamard differentiable at P G D q around £ 6 E tangentially 
to Do C D and that £ : D q — > H is continuous. Let P be a sequence of random elements in D q such that 

y/n{P~P)^Z (n^oo), 

where Z is some tight random element in Do- Then 

Vn(q(P; £(P)) - q(P; £(P))) - q'(P; £(P))Z. 

Proof. Note that Vn(q(P;i) - g(P;l)) = 9n(Vn(P - P)), where g n : D n -> P, g n (h) = y/n[q(P + 
rT^h;£(P + n~ih)) - q{P;£{P + n~ih))] and D n — {h £ D : P + n~^h G D q }. 

Let h n be a sequence such that h n G D n and h n — > h for some h G Dq. Let £ n = f(P + n~^h n ). 
The continuity of £ implies £ n — > £(P). Thus by the Hadamard differentiability of q we get g n (h„) — s> 
q'(P; £,(P))(h). Using the extended continuous mapping theorem (van der Vaart and Wellner 1996 Th 



1.11.1) finishes the proof. □ 

Proof of Theorem^ Let Z\ and Z-i be independent copies of Z . Arguing as in the first part of the proof 
of (Kosorok 2008| Theorem 12.1) one can see that unconditionally 

?H3)~(*£* 
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Applying Lemma[3]to this with the mappings (x,y;£) h-> (q(x; £), q(y; £), x, y) and (x,y) h-> £(2;) gives 



/9(-P*;n-9(-P;r)\ 

p* - p 



/g'(P;0(^i + ^2)\ 

z 2 



After that one can argue exactly as in the remainder of the proof of (Kosorok 2008 Theorem 12.1) 
p. 237, to show that 



A n := ,/n(q(P*^(P*)) - q(P;^P*))^G 



(3) 



(6) 



for G = q'(P;£)Z. Furthermore, Lemma [3] shows that 

B n :=Vn(q(P;i)-q(P;i))^G. 

Similarly to the ideas in (van der Vaart 1998 Lemma 23.3), one can show that (JsJ and (JsJ) imply 
the correct coverage probabilities. Inde ed, for any subseque nce there is a further subsequence such that 
A n -** G a.s. conditionally on P. Using 



van der Vaart 



1998 



Lemma 21.2), we get F A 1 ~» Fq 1 along this 
subsequence, where P _1 denotes the quantile function of the random variable in the subscript, i.e. F^ 1 (x) — 
inf{t G R : P(G < t) > x}. Thus for any continuity point f3 of Fq 1 , we get F^iP) F^i/l) a.s. along 
the subsequence. Thus overall, we have 

fZW)^f g \p). 

By Slutsky's lemma and jHj), B n — F^(/3) G — PJ 1 (/3). Thus, as G is continuous, 

P(Bn < -»■ P(G < Fa\P)) = /?. (7) 

This holds for all /3, because there are at most countably many points j3 at which Fq 1 is not continuous, 
because both the left- and the right-hand side of Q are monotone in /3, and because the right-hand side 
is continuous. As p* a — F A (a)/s/n, fa\ implies 



P(i(P;i) < q(P;i)-pl) = i - P(<z(P;l) - q(P;t) <pl) = l- P(B n < FT 1 (a)) -»• 1- a. 



□ 



B Proofs for Hadamard differentiability 

The main goal of this section is to prove Lemmas [I] and [2] Before we do this in Appendix |B.4| we first show 
a chain rule in Appendix |B.1| then a lemma about the uniform Hadamard differentiability of inverse maps 
in Appendix |B.2| After that we show general differentiability of hitting probability in CUSUM charts with 
respect to the updating distribution in Appendix |B.3| The results in |B.l|B3] may also be useful in other 
situations. 

B.l Chain rule 

In this section we present a chain rule for Hadamard differentiable functions. For this we need the following 
stronger version of Hadamard differentiability. 

Definition 2. Let D,E be metric spaces. A function <f> : C D — > E is called uniformly Hadamard 
differentiable at 9 G Dg along d : D x D — > R tangentially to Do C D if there exists a linear map 
(ji'e : Do — > E such that 

4>(9n + t n h n ) — 4>(0 n ) ,1 /us 
> <pg(n) 

for all 6 n — > 6 with d(6 n ,9) — > 0, t n — > and all converging sequences (h n ) with h n — > h G Do and 

8n "t" t n h n G D$. 

Lemma 4 (Chain rule). Let D, E, F be metric spaces and let H be a non-empty set. Let {/^ : Df — » 
E : £ G H} be a family of functions that is Hadamard differentiable at 8 £ Df around £ 6 S tangentially 
to Do C D. Let (f> : E$ — > F be uniformly Hadamard differentiable at f^(6) along d : E$ x E$ — > E 
tangentially to f'e{8;£){Do). Furthermore, suppose that — s> £ implies d(f(8;£„),f(6;£)) — > 0. Then 
{</> o : Df — > F : £ G H} is Hadamard differentiable at 9 around ( 6 H tangentially to Dq. 
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Proof. Let (£„) C H, (t n ) C R, (h n ) C D satisfying 8 + t„/i n G £>/ Vn and £ n -> £, t n -> 0, fr„ -> ft G Do 
as n — > oo. Let k n = — ^ fl+t "|'"- > — ^SsJ^l Hadamard differentiability of / implies k n — > q'^(h). Then by 
uniform Hadamard differentiability of 4>, 

□ 

B.2 Uniform Hadamard differentiability of the inverse map 

Let D<f, be the set of non-decreasing functions in D[u,v], for some — oo < u < v < oo, that cross P G R, i.e. 

= {_F G -D[u, v] : F non-decreasing, 3x G (u, v] : F(x—) < P < F(x)}. 

Suppose that F G and G G D<p are differentiable on [u, v] with derivatives / and g. Let G) = 
sup^g^ j — K either F or G are not differentiable on [u, v] then we set d(F, G) = oo. 

Let <j> : D,), — > R, </>(-F) = inf{a; : -F(s) > /?}, the first point at which the function crosses the threshold. 

Lemma 5. Let 8 G D</> such that 8 is differentiable on [u, v] with continuous bounded positive derivative. 
Then (f> is uniformly Hadamard differentiable at 8 along d tangentially to C[u,v]. 

Proof. Let £ = (f)(8). Let (ft n ) C D[u,v] such that h n — > h £ C[u, v]. Let (t n ) C [0,oo) such that t n — > 0. 
Let (8 n ) C Dtj, such that 8 n —¥ 8 and d(#„, 8) — > 0. Let £ n = + t n h n ). By the definition of 0, we 
have 

)<P<{9n+t n h n ){£ n ). (8) 

for every e„ > 0. Let (e n ) be positive and such that e n = o(t„). 

First, we show — > £. The sequence (ft n ) is uniformly bounded because h n — > ft and because /i is 
bounded. Thus, 

8n (£n e n ) + 0(t n ) < p < 8 n (£ n ) + 0(t n ). 

As t n — > and — > 8, 

8{£ n - e„) + o(l) < P < 8(£ n ) + o(l). 

For every S > 0, the function is bounded away from P outside (£ — S, £ + S). Furthermore, 8 is strictly 
increasing. Thus, to satisfy the previous display we must have eventually £ n > £ — S and £ n — e n < £ + 5, 
which implies £ n — > £. 

Let = 4>(9n)- Using the mean value theorem in ^ gives 

8n{£n) + (£n ~ ^n ~ £n)#n(Pln) + t n h n (£ n — 6 n (P2n) + t n h„(£„) 

for some pi„ between f n — e n and ^ n and for some p2n between and . Rewriting this using 8 n (^ n ) — P 
gives 

(Pin) < < (^n — £,n)d'„(p2n) + infen(Cii). 

By the uniform convergence of h„ and the continuity of h, we have /i n (£n — e n ) — > ft(^) and h n (^„) — > ft(C)- 
Using this, the fact that we have chosen e n such that e n — o(t n ) and that 8' n is uniformly bounded , we get 

(£n - in)8' n (pi n ) ~ o(t„) < -t„h(£) < (£„ - i n )8' n (p 2n ) + o(t n ) . 

Hence, 

_J^L _ o(1) < ^4 < _J^L + o(1) 
0„(p 2 n) UJ " u " e;(p ln ) +olij - 

We have already shown £ n — > ^ which implies pi„ — > ^ and p2n — > Using the assumptions that 
6' n —¥ 6 uniformly and that 0' is continuous shows 8' n (pi n ) — > 8'(£) and 8' n (p2 n ) — > 8'(£), which finishes the 
proof. □ 
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B.3 Differentiability of the hitting probability with respect to the up- 
dating distribution 

We are interested in hitting probabilities for CUSUM charts within the first T £ N steps. We will show that 
the mapping from the distribution of the updates Yi to the hitting probabilities is uniformly Hadamard 
differentiable. The Yi are the adjusted observations, e.g. in the notation of Section |3.2.1| they are 
Y i — x i-£i-&/ 2 ^ wnere Xi is the observed value. 

Let D,f, be the set of cumulative distribution functions on R, considered as a subset of D = Zoo(R) 
equipped with the uniform norm. Consider the mapping 

: L> -» l ao {K),q{F){c) = P(hit threshold c within T) = J g(y, c)dF( yi ) . ..dF(y T ), 



where g(y, c) = 1 (m(y) > c) with f (•) being the indicator function and m(y) is as defined in Section 3.2.1 



Lemma 6. (f> is uniformly Hadamard differentiable tangentially to Do = {H £ C(R) : linit^oo H(t) — 
limt_ ) ._ 00 H{t) — 0} with derivative 

<f>XF)(H)(c) 1 9{V,c) (n di? (%) I dH ^)- 

i=i J J 

Since H may be of infinite variation, the above integral is defined by partial integration, i.e. 

*'(f)(J0(c) = -£ / H(vi)d (fg(y,c) • 

Proof. Suppose F n F, t n — > 0, H n — > H £ Do such that F n + t n H n £ for all n. The difference 
quotient can be written as 



cl>{F n +t n H n ){c)-<t>{F n ){c) 



<=1 / /C{1,...,T} 



(9) 

r|>2 



where A/ = Jg(y,c) (Yli^i dF n (yi)) (Yln=i dH n (yij) . We first show that the second terms converges uni- 
formly in c to 0. Partial integration (applied several times) gives that for I C {1, . . . , T}, \I\ > 2, 



A/ = (-i) m | |n^(y.)j dB/(v. 



/), (io) 

where Bi = J g(y,c)\\ i( ^ I dF n (yi)- g{y, c ) is monotonically increasing in j/ thus _B/ is increasing in 
yi — (yi)i£i. Thus the total variation of Bi is bounded by 1. Hence, using (10 1, 

/ \ l J l 

t^At <t^ lsup\H n (z) 



which converges to uniformly in c. Thus the second term of |9| converges to uniformly in c. 

Next, we show that first term on the right hand side of |9}, henceforth denoted by £, converges uniformly 
in c to <j}'(F)(H). Consider the decomposition 



C-<t>'(F)(H) = c n + f^ j 9{v,c) I (\J d My,))~ll dF (yA dH ^ 

i=l y j^i j^i J 



(11) 



where C*„ = Y^ii=i / ff(2/> c ) (lTj^i dF n (yj)\ (dH n (yi) — dH(yi)). As mentioned above, H might be of infi- 
nite variation, thus C„ is defined via partial integration, i.e. 

C™ = -Xj/ (Hn{yi) - H( Vi ))d ( J g{y,c)Y[dF n { Vj )^ . 

As above, the total variation of the integrator is bounded by 1, thus \C„\ < T\\H n — H\\, which converges 
to as n — > oo. 
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We can rewrite the second term of (111 as 

ee n dp ^)) n dF ^)\- 

i=lk = l,k^i J \ j^i,j<k i^i,i>k J 

where D ik = j g(y,c)dH(yi)(dF n (y k ) - dF(y k )). Using partial integration, 

D ik =- H(yi)dg(y-i, d yi , c)(dF n (y k ) - dF(y k )) = / H(yi)(F n (y k ) - F(y k ))dg(y„ i _ k ,dy l , dy k ,c), 



where negative subscripts denote removal of the corresponding component of the vector (e.g. y~i is the 
vector y with the ith component removed). Since g is of bounded variation with respect to yi and y k 
independent of c and y-i : - k , we can bound this uniformly above by K sup z |-ff(z)| sup z \F n (z) — F(z)\ for 



some fixed K > 0. Thus, since the variation of F n — F is bounded by 2, the second term of ( 11 1 converges 
to uniformly in c. □ 

The following lemma is needed to use the result about the inverse mapping, see Lemma [5] 

Lemma 7. Let F be a cumulative distribution function with continuous bounded positive derivative f. Let 
Y — (Yi, . . . , Yt) where Yi, . . . ,Yt ~ F independently. Then the following holds. 

a) crt P(m(Y) < c) is continuously differentiable for c > (call this derivative g). 

b) g is bounded away from (at least on some compact sets), 

c) Let f n be densities that converge uniformly to f. Let Y*-™' = (Y^ n \ . . . , Y^™') ~ f n and denote the 
derivative of c ^ P(m(Y ( - rl ') < c) for c > by g^ n \ Then g^ n ' converges uniformly to g on any 
compact set K C (0, oo). 

Proof. For < i < T, let At = {y G K T : Ri(y) > i?„(y)W / i}. A t are disjoint sets with P(Y G |Ji A = 
1. Thus, P(m(Y) < c) = J2i P(m(Y) < c, Y G A») and g(c) = £f =1 ffi(c), where 

fli(c) = |- P(m(Y) <c,Ye4i) = | P(ft(Y) < c, Y G At) 
ac oc 

= -^P(Y i <c-R i - l (Y),Y€A i ) = J §- c f VlS lfee^/W*n/W^ 
= y l((yi>- • -,Vi-i,c- Ri-i(y),y i+ i, . . . ,y T ) G Ai)/(c - Ri-i(y)) Y[ f{y v )dy-t. 

The continuity of g follows because because of the dominated convergence theorem. This shows a). 
Statement b) follows from g being positive and its continuity. For c), use a telescoping sum to go from the 
product of fs to the product of / n s. Then use the dominated convergence theorem. □ 

B.4 Hadamard differentiability of hitting probability in simple examples 

Z/emmaQJ hit can be written as (f> o g, where <f> is as in Appendix |B.3| and g : D q — > D q ,g(P;£) — (x n> 
P(atf 2 +£i+A/2). 

We will show that g is Hadamard differentiable at P around £ tangentially to Do- Clearly, g is linear 
in P. Thus for t n — > 0, h„ ->■ h G D , £ n — > £, 

g(P + t n h„;£ n ) - g{P;^ n ) &s f ^ 
7^ — — — — -g(h;0 =g(h n ;£ n )-g(h;g) 

The first term converges uniformly to as h n — > h. The second term converges to as h G Do implies that 
h is uniformly continuous. 

Lemma [6] allows us to use the chain rule in Lemma[4j to show the differentiability of hit. 

The differentiability of oat can be seen as follows: £ n — > £ implies that the derivative of g(P',^ n ) 
converges uniformly to the derivative of g(P; £). As g{P; £n)'(x) = f(x£,2n + £in + A/2)^2n this is implied 
by the uniform continuity of /. Thus, by LemmaJT] the derivative of hit(P; £ n ) converges uniformly to the 
derivative of hit(P; £). Thus, the result follows using the chain rule (Lemma[4]|, the differentiability of hit, 
and the differentiability of the inverse (Lemma [5]l . □ 

LemmaM Let j:(lx (0,oo)) 2 -> D q , g(n,cr,£) = (*H $( €l+A/2 + S23: ' M )). Then, as in the proof of 
Lemma 111 hit^ = <f> o g. The proof can be completed with similar steps. □ 
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